Consistent Aggregations in Databases with Referential Integrity Errors

نویسندگان

  • Carlos Ordonez
  • Javier García-García
چکیده

A data warehouse integrates tables coming from multiple source databases, where each database has different tables, columns with similar content across databases and different referential integrity constraints, enforced to different compliance levels. Some source databases may have more reliable data than others, if referential integrity is more strictly enforced or their respective logical data model is more comprehensive. Thus, a query in an integrated database is likely to refer to tables and columns with referential integrity errors. In this work, we improve aggregations to handle referential integrity errors on OLAP databases. Specifically, when two tables are joined SQL ignores those tuples with invalid foreign key values, effectively discarding potentially valuable information. We extend aggregations to return complete answer sets in the sense that no tuple is excluded. Two families of extended aggregations are proposed: weighted referential aggregations and full referential aggregations, which return an approximate answer set and perform a dynamic repair, respectively. Finally, we introduce a simple method to improve aggregation accuracy. Experiments analyze approximation accuracy and time performance of our extended aggregations on a synthetic database, comparing them with standard SQL aggregations on databases with varying referential error rates. The extra work to compute extended aggregations is reasonable and approximate answer sets are highly accurate, making our aggregations a good alternative to standard aggregations in a data warehouse.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended aggregations for databases with referential integrity issues

Querying databases with incomplete or inconsistent content remains a broad and difficult problem. In this work, we study how to improve aggregations computed on databases with referential errors in the context of database integration, where each source database has different tables, columns with similar content across multiple databases, but different referential integrity constraints. Thus, a ...

متن کامل

Referential integrity quality metrics

Referential integrity is an essential global constraint in a relational database, that maintains it in a complete and consistent state. In this work, we assume the database may violate referential integrity and relations may be denormalized. We propose a set of quality metrics, defined at four granularity levels: database, relation, attribute and value, that measure referential completeness and...

متن کامل

A Referential Integrity Browser for Distributed Databases

We demonstrate a program that can inspect a distributed relational database on the Internet to discover and quantify referential integrity issues for integration purposes. The program computes data quality metrics for referential integrity at four granularity levels: database, table, column and value, going from a global to a detailed view, exhibiting specific evidence about referential errors....

متن کامل

Towards a global biological information infrastructure

This paper analyzes the problems of working with large, mixed-origin taxonomic databases.The analyses were based in an example of a database that included more than 50 000specimens of Papilionidae and Pieridae butterflies of Mexico, obtained from ca. twentydifferent museums. The major problems and errors present in this database were classified aserrors of structure, consistency...

متن کامل

Referencial Integrity Model for XML Data Integrated from Heterogeneous Databases Systems - Using the Power of XML for Consistent Data Integration

This article presents a proposal for maintenance of the referential integrity in data integrated from relational heterogeneous databases stored in XML materialized views. The central idea is the creation of a repository of rules that will have to be observed to if carrying through any operation of update in the mediating layer of a system for integration of heterogeneous relational sources of d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011